Here we model an intervention to address malnutrition among school-aged children in urban Hanoi. The model follows the outline developed in a workshop in July 2023. A test run of the intervention will be carried out by the Center for the Development of Organic Agriculture (CODAS) under the Association of Organic Agriculture of Vietnam.

Should urban Hanoi school boards invest time and money in creating school gardens?

The gardens will be designed for teaching about nutrition and STEM (an approach to learning and development that integrates science, technology, engineering and maths) at primary and secondary schools. In the 2nd year the garden is expected to start running well. The 3rd year is when the education plan will be fully running.

Methods

We use several different functions to create a working model of the garden intervention.

Chance Event Function: Generating Conditional Values

The chance_event function generates conditional values with:

\[ occurrence = \begin{cases} rbinom(length(value\_if), 1, chance) & if !one\_draw \\ rbinom(1, 1, chance) & if one\_draw \end{cases} \]

and final values with:

\[ chance\_event = occurrence \cdot value\_if + (1 - occurrence) \cdot value\_if\_not \]

Value Varier Function: Generating Synthetic Data

source("functions/vv.R")

The vv function generates synthetic data with varying means and coefficients of variation with:

  • \(n\): Number of data points to generate.
  • \(var\_mean\): Mean value for the data.
  • \(var\_CV\): Coefficient of Variation (CV) for the data.
  • \(distribution\): Distribution of the data (e.g., normal distribution).
  • \(absolute\_trend\): Absolute trend to add to the data.
  • \(relative\_trend\): Relative trend to apply to the data.
  • \(lower\_limit\): Lower limit for generated data.
  • \(upper\_limit\): Upper limit for generated data.

These are used to calculate annual means:

  • If both absolute and relative trends are absent, the annual means remain constant:

    \[ annual\_means = var\_mean \]

  • If an absolute trend is present, annual means are adjusted linearly:

    \[ annual\_means = var\_mean + absolute\_trend \cdot (0, 1, \ldots, n-1) \]

  • If only a relative trend is present, annual means change multiplicatively:

    \[ annual\_means = var\_mean \cdot (1 + relative\_trend / 100)^{(0, 1, \ldots, n-1)} \]

  • Generate random data points based on the specified distribution using calculated annual means:

    \[ out \sim \mathcal{N}(annual\_means, \lvert annual\_means \rvert \cdot \frac{var\_CV}{100}) \]

  • If specified, limit data points to the defined lower and upper limits:

    \[ out = \max(out, lower\_limit), \quad out = \min(out, upper\_limit) \]

Net Present Value (NPV)

source("functions/discount.R")

The discounted value of a time series value \(x_t\) at time \(t\) can be calculated using the formula:

\[ \text{Discounted Value}_t = \frac{x_t}{\left(1 + \frac{\text{Discount Rate}}{100}\right)^{t-1}} \]

Where:

  • \(x_t\) is the value at time \(t\),

  • \(\text{Discount Rate}\) is the discount rate provided to the function,

  • \(t\) is the time period (starting from 1).

  1. Net Present Value (NPV) (if calculate_NPV is set to TRUE): The Net Present Value of a time series with discounted values can be calculated as the sum of all discounted values:

    \[ \text{NPV} = \sum_{t=1}^{n} \frac{x_t}{\left(1 + \frac{\text{Discount Rate}}{100}\right)^{t-1}} \]

    Where \(n\) is the number of time periods in the time series.

Monte Carlo Simulation Function

source("functions/mcSimulation.R")
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
##      (status 2 uses the sf package in place of rgdal)
## 
## Attaching package: 'decisionSupport'
## The following objects are masked _by_ '.GlobalEnv':
## 
##     discount, vv
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

The mcSimulation function performs Monte Carlo simulations to estimate model outputs based on provided parameters and a model function.

Monte Carlo Simulation Process

The Monte Carlo simulation process generates a set of estimated model outputs based on random input samples, providing a distribution of potential outcomes including:

  • random input generation with \(x_1, x_2, \ldots, x_n\) as a set of \(n\) random input samples drawn from a given distribution, representing the input parameters of the model.

  • a model function \(f(x)\) that maps the input data \(x\) to the corresponding model output \(y\). This can be expressed as:

    \[ y_i = f(x_i), \quad i = 1, 2, \ldots, n \]

  • Estimated model outputs \(\hat{y}_1, \hat{y}_2, \ldots, \hat{y}_n\) obtained by applying the model function to the random input samples:

    \[ \hat{y}_i = f(x_i), \quad i = 1, 2, \ldots, n \]

Urban Hanoi school garden simulation

# Source our model
source("CODAS_Garden_Model.R")

# Ensure consistent results with the random number generator
# not for each 'run' of the MC simulation but for 
# consistency each time we call on the simulation 
set.seed(1234) 

garden_simulation_results <- mcSimulation(
  estimate = estimate_read_csv("inputs_school_garden.csv"),
  model_function = school_garden_function,
  numberOfModelRuns = 1000, #run 1000 times
  functionSyntax = "plainNames"
)

The way we present NPV values can influence decision makers. The same information presented in different ways can even lead to different decisions. Here we derive a plot that compares the decision options as pure NPV values.

source("functions/plot_distributions.R")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
plot_distributions(mcSimulation_object = garden_simulation_results, 
                                    vars = c("NPV_garden", "NPV_no_garden"),
                                    method = 'hist_simple_overlay', 
                                    base_size = 7, 
                                    x_axis_name = "Comparative NPV outcomes")

Framing the outcomes

Under Prospect Theory the way we present NPV values can influence decision makers - the same information presented in different ways can lead to different decisions. For example, framing a projected NPV gain as a “reduction in potential loss” might make it more attractive to decision makers due to loss aversion.

Here we plot the distribution for the decision and frame the projected NPV gain for the ‘decision’ (distributions for the two options with the NPV values of the no garden option subtracted from those for the garden). If we display this as a “Reduction in potential loss” it might be more attractive to decision makers due to loss aversion, i.e. school boards might put more emphasis on avoiding potential losses than on seeking gains. We can frame our results as a strategy that minimizes losses rather than one that maximizes gains.

plot_distributions(mcSimulation_object = garden_simulation_results, 
                                    vars = "decision",
                                    method = 'hist_simple_overlay', 
                                    base_size = 7,  
                                    x_axis_name = "Reduction in potential loss")

Summary of results for the decision

Summary

Use gt_plt_summary() from {gtExtras}

library(gtExtras)
library(svglite)
# Subset the outputs from the mcSimulation function (y) to summarize only on the variables that we want.
mcSimulation_summary <- data.frame(garden_simulation_results$x[2:55], 
                                 garden_simulation_results$y[1:3])

gt_plt_summary(mcSimulation_summary) 
mcSimulation_summary
1000 rows x 57 cols
Column Plot Overview Missing Mean Median SD
discount_rate 3.49.4 0.0% 6.5 6.5 0.9
size_of_garden 28123 0.0% 75.2 75.2 14.9
CV_value 0.010.53 0.0% 0.3 0.3 0.1
inflation_rate 2.912.2 0.0% 7.5 7.5 1.5
if_students_like 0.390.90 0.0% 0.6 0.6 0.1
if_parents_like 0.351.00 0.0% 0.7 0.7 0.1
if_community_likes 0.010.98 0.0% 0.5 0.5 0.2
if_effective_manage 0.410.82 0.0% 0.6 0.6 0.1
if_garden_yield_enough 0.160.76 0.0% 0.5 0.5 0.1
if_garden_healthy 0.290.99 0.0% 0.7 0.7 0.1
if_teachers_like 0.031.00 0.0% 0.5 0.5 0.2
if_effective_teaching 0.021.00 0.0% 0.6 0.6 0.2
if_effective_training 0.020.99 0.0% 0.5 0.5 0.2
if_offer_green_space 0.320.99 0.0% 0.7 0.7 0.1
if_reduce_polution 0.080.64 0.0% 0.3 0.3 0.1
if_biophysical_good 0.030.65 0.0% 0.3 0.3 0.1
equipment_cost 31120 0.0% 74.4 74.8 15.6
construction_cost 439 0.0% 22.6 22.6 4.8
garden_designing_costs 7.817.0 0.0% 12.5 12.5 1.6
teacher_training_cost 126 0.0% 12.5 12.7 4.5
school_board_planning 314 0.0% 9.0 9.0 1.8
teaching_equipment 313 0.0% 7.6 7.5 1.5
compost_starting 313 0.0% 7.5 7.5 1.5
worm_starting 0.96.4 0.0% 3.4 3.4 0.9
livestock_costs 0.66.5 0.0% 3.5 3.5 0.9
if_family_pays_establishment 0.100.61 0.0% 0.3 0.4 0.1
establishment_family_portion_paid 0.080.66 0.0% 0.3 0.3 0.1
maintaining_labor 1749 0.0% 32.6 32.4 4.6
teacher_salary_cost 1534 0.0% 24.9 25.0 3.1
teaching_equipment_annual 313 0.0% 7.6 7.6 1.5
teaching_tools 0.66.0 0.0% 3.5 3.5 0.9
seed_costs 0.72.6 0.0% 1.5 1.5 0.3
fertilizer 0.52.5 0.0% 1.5 1.5 0.3
plant_protection 0.76.2 0.0% 3.5 3.6 0.9
livestock_maint 013 0.0% 6.0 6.1 2.4
annual_teacher_training 2.36.1 0.0% 4.0 4.0 0.6
if_school_has_canteen 0.080.61 0.0% 0.3 0.3 0.1
canteen_savings 213 0.0% 7.5 7.5 1.6
sale_of_yield 139 0.0% 20.1 20.1 6.0
extra_cirricular_savings 0150 0.0% 60.9 61.6 24.4
formal_edu_savings 3138 0.0% 60.0 61.0 24.4
outside_investment_value 01K 0.0% 174.0 124.5 166.7
increased_enrollment_value 1138 0.0% 53.0 51.2 27.0
if_increase_tuition 0.030.42 0.0% 0.2 0.2 0.1
tuition_increase 3.012.2 0.0% 7.4 7.5 1.5
child_veg_access 3.311.6 0.0% 7.5 7.5 1.6
child_healthier_choices 2676 0.0% 293.5 291.4 119.0
green_space_value 51254 0.0% 150.2 149.9 30.9
reduce_polution_value 225 0.0% 14.8 14.7 3.2
school_event_value 075 0.0% 29.8 29.6 12.8
value_of_non_garden_land_use 1063 0.0% 35.0 34.8 9.0
if_parking 0.010.24 0.0% 0.1 0.1 0.0
parking_value 47250 0.0% 150.2 149.1 31.2
costs_of_non_garden_land_use 0.07.1 0.0% 3.0 3.0 1.2
NPV_garden 14K 0.0% 1,188.6 1,103.0 545.9
NPV_no_garden 922K 0.0% 324.6 246.1 271.5
decision -8414K 0.0% 864.0 842.6 603.0

Summary of the savings

summary(garden_simulation_results$y$decision)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -840.7   488.7   842.6   864.0  1209.1  3778.0

Summary of costs

summary(garden_simulation_results$y$total_costs)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   519.1   611.1   643.8   642.6   672.7   834.9

First year

summary(garden_simulation_results$y$Cashflow_garden1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -135.75   24.15   82.87   98.77  156.99  687.72

Cashflow of the garden option

source("functions/plot_cashflow.R")
plot_cashflow(mcSimulation_object = garden_simulation_results, 
              cashflow_var_name = "Cashflow_garden")

EVPI

# Subset the outputs from the mcSimulation function (y) by selecting the correct variables be sure to run the multi_EVPI only on the variables that the we want.
mcSimulation_table <- data.frame(garden_simulation_results$x, 
                                 garden_simulation_results$y[1:3])
source("functions/multi_EVPI.R")
evpi <- multi_EVPI(mc = mcSimulation_table, first_out_var = "NPV_garden")
## [1] "Processing 3 output variables. This can take some time."
## [1] "Output variable 1 (NPV_garden) completed."
## [1] "Output variable 2 (NPV_no_garden) completed."
## [1] "Output variable 3 (decision) completed."
source("functions/plot_evpi.R")
plot_evpi(evpi, decision_vars = "decision")
## Warning: There are no variables with a positive EVPI. You probably do not need
## a plot for that.

PLS

Projection to Latent Structures model can give us some sense of the correlation strength and direction for model variables and our outcome variables.

source("functions/pls_model.R")
pls_result <- pls_model(object = garden_simulation_results,
                                resultName = names(garden_simulation_results$y)[1], 
                                ncomp = 1)

input_table <- read.csv("inputs_school_garden.csv")
source("functions/plot_pls.R")
plot_pls(pls_result, input_table = input_table, threshold = 0)

The full repository can be accessed with the following QR code.